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BACKGROUND OF THE INVENTION 

FIELD OF THE INVENTION 

[001] The invention relates to database systems. Specifically, the invention 
relates to apparatus, systems, and methods for passing data between an extensible 
Markup Language (XML) document and a hierarchical database. 



DESCRIPTION OF THE RELATED ART 

[002] Today, business applications increasingly rely on XML documents to 
exchange data. Generally, modern software applications communicate with each other 
over the Internet using XML documents as a common data interchange language for 
Business to Business (B2B) and Business to Consumer (B2C) communications. 
Technologies such as webservers, servlets, web applications, web services, and the like 
generally rely in some fashion of data organized according to the extensible Markup 
Language Specification. 

[003] Typically, these same software applications then communicate the data in 
£g the XML document to database servers for storage in a database. Generally, before an 

S g = XML document is stored in a database, the XML document is analyzed to ensure that the 

O <S X 

</> h 5 t XML document is a "valid" XML document. An XML schema is used to validate an 

djjglo XML document. As used herein, references to "an XML document" mean that the XML 

3 < 3 h document is a valid XML document according to a predefined XML schema. Because an 

g XML document provides such flexibility in the organization and types of XML elements, 

XML documents are validated to ensure that they are organized as expected. An invalid 
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XML document may lead to unpredictable or erroneous results in software modules using 
the invalid XML document. 

[004] An XML schema defines the structure, organization, and data types that 
are acceptable in all corresponding XML documents. The XML schema defines a set of 
XML elements, XML element attributes, and organization among the XML elements that 
is desired. The XML schema serves as a vocabulary for the XML elements. 
Consequently, the XML schema defines a superset of valid XML documents. The valid 
XML documents include one or more of the XML elements, XML attributes, and 
structure among the XML elements as defined in the XML schema. 

[005] Typically, prior to storing the XML document, the XML document is 
validated. Generally, two types of databases may store the data in the XML document, 
hierarchical or relational. Each type of database has different benefits and limitations, 
which will be discussed in more detail below. 

[006] Generally, the databases store data or an XML document in two different 
formats. In one aspect, the raw data contained in the elements of the XML document are 
removed from the XML document and stored in the database. Data stored in this manner 
is referred to herein as "decomposed" data because the formatting of the XML document 
is removed to store only the raw data. In another aspect, the raw data including the 
formatting that comprises the XML document are stored in the database. When the XML 
document is stored in the database in this manner, this is referred to herein as storing the 
XML document "intact" because the formatting of the XML document or an XML sub- 
tree is preserved within the database. 

[007] To control costs, it is desirable that modern technologies such as XML 
documents be capable of readily interfacing with existing computer and information 
technology without significantly modifying the existing computer and information 
technology. For example, large corporations, governments, and other entities continue to 
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use legacy applications, which are software programs designed, written, and maintained 
for large, mission-critical computers, such as mainframes. These entities have invested 
large amounts of work and money into developing and maintaining the legacy 
applications. In addition, these applications have been tested and refined to operate very 
efficiently and with minimal errors. Legacy applications continue to manage a high 
percentage of the everyday transactions and data for these businesses. 

[008] Similarly, many of these legacy applications continue to store and retrieve 
data using hierarchical databases, such as IBM's Information Management System (IMS), 
instead of common relational databases such as the Oracle database available from the 
Oracle corporation. To facilitate storing and retrieving data in XML documents (referred 
to herein as "XML data"), functionality for passing XML data between XML documents 
and relational databases has been developed. Generally, this functionality is integrated 
into the database servers for relational databases. Consequently, users' versions of the 
database serves must be updated to enable support for passing of data between an XML 
document and a relational database. 

[009] Unfortunately, no tools, either standalone or integrated, exist for passing 
XML documents and/or XML data between an XML document and a hierarchical DB, 
one example of which is IMS. Consequently, one of two conventional solutions has been 
implemented depending on the circumstances. 

[010] One solution is to store the XML document either intact or decomposed in 
a native XML database. A native XML database is one which is designed and originally 
built to store and retrieve XML documents. One example, of a native XML database is 
the Tanimo database available from the Software AG corporation of Darmstadt Germany. 
However, using a native XML database may require that two databases be maintained, 
the XML database as well as the hierarchical database. In addition, application specific 
software may need to be developed to move raw data between the XML database and the 
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hierarchical database. Furthermore, the native XML databases may not yet include all the 
standard features and functions of conventional hierarchical databases such as data 
backup, indexing, speed optimizations, and the like. 

[Oil] Another solution is to write specific software modules that read through a 
specific XML document searching for elements of interest, retrieving the raw data and 
storing the raw data within the hierarchical database. Similarly, the software modules 
may be programmed to reproduce a specific XML document with the appropriate 
formatting and metadata for raw data within the hierarchical database. However, these 
software modules are inflexible and must be constantly revised as XML elements are 
removed, added, or modified for the XML document. In addition, developing such 
software may be difficult because the software must accommodate all valid XML 
documents for a specific XML schema. A software application may use a number of 
different XML schema which require a customized software module for each XML 
schema. Such maintenance and development can become prohibitively expensive. 

[012] Accordingly, a need exists for an apparatus, system, and method for 
passing data between sharing an XML document and a hierarchical database. The 
apparatus, system, and method should allow for storage and retrieval of XML data and/or 
the XML document in a decomposed or intact format within a hierarchical database. In 
addition, the apparatus, system, and method should allow for indexing of an XML 
document or a sub-tree of the XML document when the XML document or sub-tree is 
stored in the hierarchical database in an intact format. The apparatus, system, and method 
should also allow for storage and retrieval of an XML document or a sub-tree of the XML 
document in a mixed format of decomposed and intact. Additionally, the apparatus, 
system, and method should allow for passing of data between an XML document and a 
hierarchical database without any changes to the functionality or software of the 
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hierarchical database. Further, the apparatus, system, and method should interface with 
the hierarchical database using standard external commands to the database. 
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BRIEF SUMMARY OF THE INVENTION 

[013] The present invention has been developed in response to the present state 
of the art, and in particular, in response to the problems and needs in the art that have not 
yet been met for passing data between an XML document and a hierarchical database. 
Accordingly, the present invention has been developed to provide an apparatus, system, 
and method for passing data between an XML document and a hierarchical database that 
overcomes many or all of the above-discussed shortcomings in the art. 

[014] An apparatus according to the present invention includes a hierarchical 
database, a metadata schema, and a mapping module. The hierarchical database 
comprises a conventional hierarchical database, such as IMS, configured to provide 
standard features and functions of hierarchical databases such as security, data integrity, 
data backup, and the like. The metadata schema is derived from the hierarchical 
database. The metadata schema includes a first representation representative of the 
hierarchical structure of the hierarchical database, a second representation representative 
of the hierarchical structure of XML documents valid for passing into and out of the 
hierarchical database, one or more database field names, and one or more XML element 
names that map to the one or more database field names. The mapping module passes 
data between the XML document and the hierarchical database using the metadata 
schema. 

[015] In certain embodiments, the mapping module includes an input module, a 
matching module, a generator, a storage module, and an assembler. The input module 
receives an XML document for storage in the database or a query for retrieval of an XML 
document from the database. The matching module matches an XML element of the 
XML document with a metadata element defined in the metadata schema to store the 
XML document in the decomposed format. Similarly, the matching module matches 
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each database field of the hierarchical database with a metadata element defined in the 
metadata schema to retrieve the XML document from the database. 

[016] To retrieve a decomposed XML document from the database, the 
generator module generates an XML element defined by the matching metadata element 
identified by the matching module. The generated XML element includes content data 
from the matching database field. The assembler module assembles the generated XML 
elements into an XML document. 

[017] To store an XML document in decomposed format, the storage module 
stores content data from the XML element in a database field. The database field 
matches the metadata element defined in the metadata schema that matched the XML 
element. In certain embodiments, the storage module may also change the data type 
and/or encoding of the content data to correspond to the requirements of the database 
field. 

[018] In one embodiment, the storage module cooperates with the input module 
to store the XML document in intact format in one or more database nodes of the 
hierarchical database. The XML document may be written directly to the database 
node(s) without any conversion or type comparison. Similarly, the storage module may 
be used to retrieve an XML document stored in intact format from the database node(s) of 
the hierarchical database in response to a key provided to the input module. The key 
uniquely identifies the XML document within the hierarchical database. 

[019] A system of the present invention is provided for passing data between an 
XML document and a hierarchical database. The system includes an interface, a database 
schema, an XML schema, and a mapping module. The interface allows for an XML 
document to be identified for retrieval or storage. The database schema and XML 
schema together may comprise a metadata schema similar in format and function to that 
referred to above in relation to one embodiment of an apparatus. The mapping module 
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may function in similar manner to the mapping module referred to above in relation to 
certain embodiments of an apparatus in accordance with the present invention. 

[020] A method of the present invention is also presented for passing data 
between an XML document and a hierarchical database. In one embodiment, the method 
includes providing a hierarchical database. Next, a metadata schema derived from the 
hierarchical database is provided. The metadata schema includes a first representation 
representative of the hierarchical structure of the hierarchical database, a second 
representation representative of the hierarchical structure of valid XML documents, a 
database field name, and an XML element name that maps to the database field name. 
The data is then passed between an XML document and the hierarchical database using 
the metadata schema. 

[021] The features and advantages of the present invention will become more 
fully apparent from the following description and appended claims, or may be learned by 
the practice of the invention as set forth hereinafter. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

[022] Li order that the advantages of the invention will be readily understood, a 
more particular description of the invention briefly described above will be rendered by 
reference to specific embodiments that are illustrated in the appended drawings. 
Understanding that these drawings depict only typical embodiments of the invention and 
are not therefore to be considered to be limiting of its scope, the invention will be 
described and explained with additional specificity and detail through the use of the 
accompanying drawings, in which: 

[023] Figure 1 is a conceptual block diagram illustrating relational data 
structures for nodes in a relational database, a hierarchical database, and an XML 
document; 

[024] Figure 2 is a logical block diagram illustrating one embodiment of an 
apparatus in accordance with the present invention; 

[025] Figure 3 is a block diagram illustrating one embodiment of a metadata 
schema in accordance with the present invention; 

[026] Figure 4 is a schematic block diagram illustrating a system according to 
one embodiment of the present invention; 

[027] Figure 5 is a schematic block diagram illustrating sub-components of one 
embodiment of the system illustrated in Figure 4; 

[028] Figure 6 is a schematic flow chart diagram illustrating a method of the 
present invention for passing data between an XML document and a hierarchical database 
such that content data of the XML document is stored in the hierarchical database in a 
decomposed format; 

[029] Figure 7 is a schematic flow chart diagram illustrating a method of the 
present invention for passing data between an XML document and a hierarchical database 
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such that content data of the XML document stored in the hierarchical database in a 
decomposed format is retrieved into an XML document; 

[030] Figure 8 is a schematic flow chart diagram illustrating a method of the 
present invention for passing data between an XML document and a hierarchical database 
such that the XML document is stored in the hierarchical database in an intact format; and 

[031] Figure 9 is a schematic flow chart diagram illustrating a method of the 
present invention for passing data between an XML document and a hierarchical database 
such that an XML document stored in an intact format is retrieved from the hierarchical 
database. 
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DETAILED DESCRIPTION OF THE INVENTION 



[032] It will be readily understood that the components of the present invention, 
as generally described and illustrated in the figures herein, may be arranged and designed 
in a wide variety of different configurations. Thus, the following more detailed 
description of the embodiments of the apparatus, system, and method of the present 
invention, as presented in Figures 1 through 9, is not intended to limit the scope of the 
invention, as claimed, but is merely representative of selected embodiments of the 
invention. 

[033] Many of the functional units described in this specification have been 
labeled as modules, in order to more particularly emphasize their implementation 
independence. For example, a module may be implemented as a hardware circuit 
comprising custom VLSI circuits or gate arrays, off-the-shelf semiconductors such as 
logic chips, transistors, or other discrete components. A module may also be 
implemented in programmable hardware devices such as field programmable gate arrays, 
programmable array logic, programmable logic devices or the like. 

[034] Modules may also be implemented in software for execution by various 
types of processors. An identified module of executable code may, for instance, comprise 
one or more physical or logical blocks of computer instructions which may, for instance, 
be organized as an object, procedure, function, or other construct. Nevertheless, the 
executables of an identified module need not be physically located together, but may 
comprise disparate instructions stored in different locations which, when joined logically 
together, comprise the module and achieve the stated purpose for the module. 

[035] Indeed, a module of executable code could be a single instruction, or many 
instructions, and may even be distributed over several different code segments, among 
different programs, and across several memory devices. Similarly, operational data may 
be identified and illustrated herein within modules, and may be embodied in any suitable 
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form and organized within any suitable type of data structure. The operational data may 
be collected as a single data set, or may be distributed over different locations including 
over different storage devices, and may exist, at least partially, merely as electronic 
signals on a system or network. 

[036] Reference throughout this specification to "a select embodiment," "one 
embodiment," or "an embodiment" means that a particular feature, structure, or 
characteristic described in connection with the embodiment is included in at least one 
embodiment of the present invention. Thus, appearances of the phrases "a select 
embodiment," "in one embodiment," or "in an embodiment" in various places throughout 
this specification are not necessarily all referring to the same embodiment. 

[037] Furthermore, the described features, structures, or characteristics may be 
combined in any suitable manner in one or more embodiments. In the following 
description, numerous specific details are provided, such as examples of programming, 
software modules, user selections, user interfaces, network transactions, database queries, 
database structures, hardware modules, hardware circuits, hardware chips, etc., to provide 
a thorough understanding of embodiments of the invention. One skilled in the relevant 
art will recognize, however, that the invention can be practiced without one or more of 
the specific details, or with other methods, components, materials, etc. In other instances, 
well-known structures, materials, or operations are not shown or described in detail to 
avoid obscuring aspects of the invention. 

[038] The illustrated embodiments of the invention will be best understood by 
reference to the drawings, wherein like parts are designated by like numerals throughout. 
The following description is intended only by way of example, and simply illustrates 
certain selected embodiments of devices, systems, and processes that are consistent with 
the invention as claimed herein. 
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[039] Figure 1 illustrates three exemplary diagrams, a hierarchical database 
diagram 102, a relational database diagram 104, and an XML document diagram 106. 
The diagrams 102, 104, 106, illustrate the relationship between database nodes (and 
corresponding XML elements in the XML document). 

[040] Each diagram includes database nodes represented by the letters A-F. 
Each database node associates related data. Of course each database may include 
different terminology for the database node, fields within database nodes, and 
relationships between the nodes. For example, in a hierarchical database, such as IMS, a 
database node is referred to as a segment that includes one or more database fields storing 
raw data. In a relational database, the database node may correspond to a database table 
that includes one or more database fields. The database fields of a hierarchical and 
relational database correspond to XML sub-elements within an XML element of an XML 
document. 

[041] The XML document includes a root XML element that may include one or 
more XML sub-elements, which sub-elements may each include one or more sub- 
elements. Those of skill in the art will recognize, based on the context, that references to 
an XML element herein refers to either an XML root element or XML sub-element as 
appropriate. Typically, the structure of XML sub-elements (nodes B-F) in relation to the 
root XML element is represented by nesting XML sub-elements within begin and end 
tags of appropriate parent elements. The XML root element and XML sub-elements are 
organized in a parent-child relationship. Each parent node may have many child nodes. 
But, a child node may have only one parent node. This relationship constitutes a 
hierarchical relationship. 

[042] Advantageously, the hierarchical database is also structured according to 
the same parent-child relationship as required in an XML document. The hierarchical 
database includes a root node and one or more child nodes related to the root node. Each 
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child node may also have one or more child nodes. Certain hierarchical databases have 
been managing data according to the parent-child relationships for many years. 
Consequently, many complex and expensive software applications have been built around 
the speed, reliability, stability, and features such as indexing and data preservation 
provided by these hierarchical databases. 

[043] In contrast, the relational database diagram 104 illustrates database nodes 
A-F organized according to relationships that are not limited to strictly parent-child 
relationships. One reason relational databases have been widely used is that the relational 
database can represent many-to-many relationships between database nodes. By way of 
example, suppose database node D represents parts and database node E represents 
invoices. Typically, an invoice can include many parts and a single part can appear on 
many invoices. 

[044] Many-to-many relationships allow for the amount of data duplication in 
the database to be minimized to a higher extent than may be possible in a hierarchical 
database. However, as a consequence, queries for the data in many-to-many relationships 
maybe slower,more complicated, and involve certain complex join queries. 

[045] The structure among database nodes of the hierarchical database diagram 
102 allows for faster retrieval and storage of data than in the relational database diagram 
104. In addition, the database nodes of the hierarchical database diagram 102 follow the 
same parent-child relationship constraints. Consequently, an XML document (which is 
organized in a hierarchical fashion) with its XML root element and XML sub-elements is 
readily mapped to corresponding database nodes of the hierarchical database. 

[046] Note, however, that the hierarchical structure between the XML diagram 
106 and the hierarchical database diagram 102 does not match. For example, XML 
element B descends from root XML element A in the XML diagram 106 and database 
node B descends from database node F in the hierarchical database diagram 102. The 
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present invention allows for data to be mapped between XML element B and the database 
node B even though the hierarchical structures are not exactly the same. 

[047] Referring now to Figure 2, a logical block diagram illustrates major 
components in one embodiment of an apparatus 200 for passing data between an XML 
document 202 and a hierarchical database 204. The apparatus 200 includes a valid XML 
document 202, a hierarchical database 204, a mapping module 206, and a metadata 
schema 208. 

[048] The XML document 202 is a valid XML document. As mentioned above, 
this means that there exists an XML schema or Document Type Definitions (DTD) file 
that defines all the XML elements that may appear on the XML document 202, how those 
XML elements may be related in a parent-child hierarchy, data types for data in the XML 
elements, and an encoding format for the data. Preferably, the XML document 202 is 
validated by another tool prior to being provided for storage within the hierarchical 
database 204. 

[049] The hierarchical database 204 may be any standard hierarchical database. 
Preferably, the hierarchical database 204 is IMS. The hierarchical database 204 supports 
basic interface commands, such as get, insert, replace, delete, and all, for manipulating 
data of a single database node or database field. Preferably, the hierarchical database 204 
is not modified in any way to accommodate use of the hierarchical database 204 with the 
present invention. In this manner, data from XML documents stored in the hierarchical 
database 204 in decomposed format may be used by legacy applications and other users 
of the hierarchical database 204 without concern that the data was provided originally in 
an XML document 202. 

[050] The mapping module 206 maps data between the XML document 202 and 
the hierarchical database 204. In one embodiment, the mapping module 206 is external to 
the hierarchical database 204 and passes the data between the XML document 202 and 
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the hierarchical database 204 using the metadata schema 208 and external database 
commands. To store or retrieve data in decomposed and mixed decomposed and intact 
formats, the mapping module 206 relies on the metadata schema 208. If the whole XML 
document is to be saved in intact format, the mapping module 206 may not need the 
metadata schema 208. 

[05 1] The mapping module 206 and metadata schema 208 will be described in 
more detail below. The metadata schema 208 includes the hierarchical structure of the 
XML document 202, the hierarchical structure of the hierarchical database 204, and one 
or more database field names that map to corresponding XML element names in the XML 
document 202. The mapping module 206 maps between XML elements in the XML 
document 202 and database nodes in the hierarchical database 204 by matching the XML 
element name to the database field name. Once the mapping has been made, the mapping 
module 206 performs any necessary type and/or encoding format conversions, and stores 
the data in the appropriate target. If an XML document 202 is being stored, the target is a 
database field in the database 204 at the appropriate database node. If an XML document 
202 is being retrieved, the target is a generated XML element stored in the XML 
document 202. 

[052] In Figure 3, one embodiment of a metadata schema 300 is illustrated. As 
mentioned, the metadata schema 300 allows data in one hierarchical structure to be 
mapped to another hierarchical structure. In one embodiment, the metadata schema 300 
comprises a document schema 302 associated with the XML document 202 and a 
database schema 304 associated with the hierarchical database 204. The schemas 302, 
304 comprise metadata relating respectively to XML elements, database fields, data 
types, data encoding, as well as the hierarchical structure of the XML document 202 and 
hierarchical database 204. 
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[053] Those of skill in the art will recognize that the metadata in the schemas 
302, 304 may be organized and formatted according to any format including proprietary 
formats. The document schema 302 associated with the XML document 202 may 
comprise a listing of XML elements and the data types for the XML elements in a name- 
value pair arrangement. The structure of XML elements may be represented by lists of 
element names. The lists may include sub-lists of XML element names that represent the 
parent-child relationships. 

[054] In a certain embodiment, the database schema 304 associated with the 
hierarchical database 204 is a schema used by a variety of software applications accessing 
the hierarchical database 204. For example, the hierarchical database 204 may comprise 
an IMS database. Typically, IMS itself includes no metadata for the database nodes and 
database fields. Instead, metadata for an IMS database may be stored and maintained by 
other components such as Java classes. The Java classes may define the database nodes, 
database fields, and hierarchical structure between database nodes in the IMS database. 
These Java classes may be defined within a file or instantiated into Java objects that are 
referenced to provide the features of the database schema 304. 

[055] In one embodiment, the database schema 304 includes database field 

names 306 and associated database field types for database fields in the database 204. In 

addition, the document schema 302 associated with the XML document 202 preferably 

oo includes XML element names 308 that match the database field names 306 in the 

w 

H 

< z database schema 304 associated with the database 204. Preferably, there is a one to one 

O 5 1| correspondence 309 between database field names 306 and XML element names 308. 
| ||j [056] The metadata schema 300 also includes a first representation 310 

S p w 3 representative of the hierarchical structure of the hierarchical database 204 and a second 

•J* 2 S3 

g M representation 312 representative of the hierarchical structure of valid XML documents 

202 that may be stored and retrieved in decomposed format from the hierarchical 
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database 204. The first representation 310 and second representation 312 may be any 
data structure capable of capturing a parent-child hierarchical relationship. 

[057] In one embodiment, the first representation 310 comprises a list-sub-list 
structure within a class defined in a set of Java classes that make up the database schema 
304 associated with the database 204. The second representation 312 resides in the 
document schema 302 associated with the XML document 202. The second 
representation 312 may comprise a nested structure of markup language tags as used in an 
XML schema (XSD) file. 

[058] Preferably, the document schema 302 associated with the XML document 
202 is an XSD file (an XML schema 302) generated based on the hierarchical database 
204. Although the XML schema 302 is preferably generated from the hierarchical 
database 204, the first representation 310 and second representation 312 may not 
necessarily match. However, the hierarchical structure of the hierarchical database 204 
matches the hierarchical structure of the first representation 310 and the hierarchical 
structure of the XML document 202 matches the hierarchical structure of the second 
representation 312. 

[059] Figure 4 illustrates a system 400 for passing data between a valid XML 
document 202 and a hierarchical database 204. The system 400 includes an XML 
document 202, mapping module 206, and hierarchical database 204 very similar to those 

oo components discussed in relation to the embodiment of Figure 2. 

< ~ [060] In addition, the mapping module 206 uses an XML schema 302 and 

O 3 1 1 database schema 304 similar to those described in relation to Figure 3. Specifically, a 

co 5 « 5 

^ § |£ document schema 302 comprises an XML schema 302 that complies with the standard 

3 1 v 3 XML schema format version 1 .0 as set forth by the World Wide Web consortium. The 

z m XML schema 302 includes a representation of the hierarchical structure of valid, well- 

ed 
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formed, XML documents 202. A well-formed XML document 202 is one which includes 
the syntax, semantics, and data content in accordance with the current XML specification. 

[061] In addition, the database schema 304 comprises Java classes defined for 
database nodes and database fields of a hierarchical database 204. The Java classes may 
comprise all or part of a predefined database schema embodied as Java classes. For 
example, the Java classes may comprise one or more Java classes in the IMS Java 
Application Programming Interface (API) available from IBM. 

[062] The database schema 304 includes a representation of the hierarchical 
structure of the hierarchical database 204, or a sub-tree thereof. The database schema 304 
and XML schema 302 are configured such that for every database field name in the 
database schema 304 there exists a corresponding XML element name in the XML 
schema 302. 

[063] The system 400 includes an interface 402. The interface 402 receives 

commands for passing of data between an XML document 202 and a hierarchical 

database 204. The interface 402 may interact with other software applications or directly 

with end users. An XML document 202 may be stored or retrieved in response to a 

command issued to the interface 402. In one embodiment, the command may also 

include an indicator as to whether the XML document 202 is to be stored or retrieved in a 

decomposed format, an intact format, or a combination of decomposed format and intact 

oo format. Preferably, indicators for decomposed, intact, and combined decomposed and 

W 

H 

<J z intact formats are embedded within a metadata schema 300 (See Figure 3) such as within 

O 3 g | the XML schema 302. 

*j 1 1£ [064] Figure 5 illustrates the mapping module 206 and interface 402 in more 

rv! O a * 

§ E * 3 detail. Preferably, the whole XML document 202 is sent to the hierarchical database 204 

n3 00 < 

g OT in response to a command issued to the interface 402. Similarly, either a whole 
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hierarchical database or a sub-tree thereof is retrieved into an XML document in response 
to a command issued to the interface 402. 

[065] The interface 402 may include an input module 502. Alternatively, the 
input module 502 may be located within the mapping module 206. The input module 502 
may comprise a command line or graphical user interface that allows an end user to pass 
data between the XML document 202 and the hierarchical database 204. In one 
embodiment, the interface 402 comprises an extension to existing technology. For 
example, the interface 402 may comprise new user-defined functions (UDFs) extensions 
for a structured query language such as, but not limited to, Structured Query Language 
(SQL). In this manner, interaction with the interface 402 may be consistent and well 
understood which minimizes a learning curve for using the interface 402. 

[066] The input module 502 may be configured to receive an XML document 
202 for storage in the hierarchical database 204. Preferably, the XML document 202 is 
valid, meaning the XML document 202 includes XML elements structurally organized 
according to the metadata schema 300 (See Figure 3). The XML document 202 may be 
provided by a file system, a web service, or another software module. 

[067] The input module 502 is also configured to receive a query to retrieve an 
XML document 202 for the hierarchical database 204. The query may comprise a key 
that uniquely identifies a database node in the hierarchical database 204 that is to be the 
root element in the retrieved XML document 202. Alternatively, the query may comprise 
a set of commands organized according to SQL. 

[068] The input module 502 communicates the XML document 202 or the query 
to the mapping module 206. The mapping module 206 may include a matching module 
504 and a storage module 506 that cooperate to store content data in the XML document 
202 within the proper database nodes and database fields of the hierarchical database 204. 
In one embodiment, the matching module 504 traverses the hierarchical tree structure of 
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the XML document 202. Preferably, the traversal begins at the root XML element and 
proceeds according to a depth-first methodology. 

[069] For each XML element, the matching module 504 finds a corresponding 
metadata element within the metadata schema 208 (See Figure 2). Preferably, the 
matching module 504 matches an XML element name with a database field name. The 
match may be a case-sensitive or case-insensitive textual match. Of course, the matching 
module 504 may use other criteria in addition to, or in place of, the database field name 
and XML element name. 

[070] Once a match is identified by the matching module 504, a storage module 

506 extracts the content data from the matching XML element. The content data may 

comprise data between the begin and end tags of the XML element as well as attributes 

listed in name- value pairs within the begin tag of the XML element. The storage module 

506 stores the content data in the appropriate database field of the hierarchical database 

204. The appropriate database node is identified by locating the matching database field 

within the first representation 310 of the hierarchical structure of the hierarchical database 

204. In one embodiment, the storage module 506 issues an external database command, 

such as a replace command, to store the content data into the database field of the 

appropriate database node of the hierarchical database 204. The database field is 

identified by the matching database field name provided by the matching module 504. 

oo [07 1 ] If an XML element includes attributes, the matching module 504 finds the 

w 

<tj = corresponding database fields in the hierarchical database 204 using the matching 

CJ ^ S 5 

O 3 1 3 metadata element and provides the database field name(s) for the attributes to the storage 

^ ||g module 506. The storage module 506 may then issue insert commands to store the values 

ry*! O m ^ 

3 C «3 of the attributes in database fields associated with the attributes of the XML element. 

N 00 < 

z " Typically, because there is a one-to-one relationship between the XML element and the 
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attributes, the values for the attributes are stored in database fields of a particular database 
node in the hierarchical database 402. 

[072] The matching module 504 and storage module 506 continue to process 
each XML element in the XML document 202 until all XML elements of the XML 
document 202 have been processed and stored. In this manner, the data of the XML 
document is stored in a decomposed format in the hierarchical database 204. 
Decomposed storage may be particularly useful where the XML document 202 includes 
significant amounts of content data or the XML document is used primarily to transport 
data. In decomposed format, the content data is treated like any other data in the 
hierarchical database 204. Consequently, the data may be searched, indexed, and backed 
up as needed. 

[073] In one embodiment, the matching module 504 includes an analysis module 
508 that is activated when the XML document 202 is stored in intact format. The 
analysis module 508 will be discussed in more detail below in relation to Figure 8. 

[074] If the input module 502 receives a query, an XML document 202 is to be 

retrieved from the hierarchical database 204. Typically, the query is in the form of a SQL 

statement. The "where" clause and "from" clause of the SQL statement may include 

expressions normally accepted by the hierarchical database 204. The "select" clause may 

invoke the input module 502 to retrieve an XML document 202 from data in the 

oo hierarchical database stored in decomposed or native format. The XML document 202 
w 

< = may be one stored in decomposed format earlier or a new XML document 202 generated 

S 3 S < on data stored in the hierarchical database 204 from another source. 

^ I i£ [075] The input module 502 provides the query to the matching module 504 

w i « 3 which locates a database node identified by the query. The database node may be the root 

N °°< 

g m node of the hierarchical database 204 or a sub-node of the database 204. If the database 

^ node is a sub-node, the retrieved XML document 202 will comprise a sub-tree of the 
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hierarchical database 204 that includes the identified database nodes and all descendent 
database nodes. 

[076] From the identified database node, the matching module 504 traverses the 
hierarchical database 204 or sub-tree of the database using a depth first search. 
Alternatively, the database sub-tree may be traversed using a breadth-first search. The 
matching module 504 matches each database field of the sub-tree in the hierarchical 
database 204 with a metadata element in the metadata schema 208 (See Figure 2). 

[077] Preferably, the matching module 504 traverses the hierarchical database 
204 or sub-tree by making external calls to the database server/engine. In a relational 
database, such traversal would require dynamically adjusting a potentially complex query. 
Advantageously, because the database nodes desired for building the XML document 
202 are in a hierarchical arrangement like the database nodes in the hierarchical database 
204, database node access functions of the hierarchical database 204 may be used through 
external calls to readily traverse the sub-tree and access each database node. For 
example, in an IMS database 204 the matching module 504 may issue "Get Next In 
Parent - GNP" calls to traverse the sub-tree. The IMS database 204 manages locating the 
next database node instead of the matching module 504. 

[078] The matching module 504 communicates a matching metadata element to 
a generator module 510. The generator module 510 generates an XML element according 
to the XML element definition included in the metadata element. The XML element 
comprises the XML formatting characters, keywords, and tokens for defining a valid 
XML element. 

[079] In addition, the XML element includes content data that is retrieved from 
the matching database field in the hierarchical database 204. If necessary, the generator 
module 510 may also perform a type conversion on the data from the database field as 
well as an encoding conversion. In certain embodiments, the matching metadata element 
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includes an indicator as to whether the database field data is regular content data or an 
attribute in a name-value pair of the XML element. If the indicator is present, the 
generator module 510 produces the appropriate name-value pair for the attributes section 
of the XML element. 

[080] The generated XML element may then be provided to an assembler 512. 
The assembler 512 assembles all the generated XML elements into a single XML 
document 202 once the final XML element is generated. The assembler 512 structures 
the XML elements according to the second representation 312 of the hierarchical 
structure of valid XML documents 202 included in the metadata schema 208. 

[081] Figure 6 illustrates a flow chart of a method 600 for storing XML 
documents 202 in a decomposed format within a hierarchical database 204. Preferably, 
the hierarchical database 204 has not been modified to accommodate the method 600. 
The method 600 begins 602 when an XML document 202 is provided for storing in the 
hierarchical database 204. 

[082] In one embodiment, the XML document 202 is first validated and parsed 
604. The XML document 202 is validated against an XML schema. The XML schema 
defines the structure, content, and semantics of all valid XML documents. Validation and 
parsing of the XML document 202 ensures that all required data is provided and that 
provided data is in the proper format and structure. 

[083] Next, an XML element is selected 606 from the parsed XML elements of 
the XML document 202 according to a depth first traversal of XML elements structured 
according to the second representation 312 of the hierarchical structure of valid XML 
documents 202 included in the XML schema. Alternatively, an XML element may be 
selected 606 by a depth first or breadth first search of the validated XML document 202. 

[084] Then, the selected XML element is matched 608 with a metadata element 
defined in the metadata schema 208 (See Figure 2). Preferably, the matching metadata 
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element includes an XML element data type identifier that indicates the data type for the 
content data in the XML element and a database field type identifier that indicates the 
data type for the data stored in the database field. If there is a data type mismatch, the 
content data is converted to the database type for the database field. Similarly, if the 
database field is stored in a different encoding format from that of the XML element 
content data, an encoding conversion may be performed. For example, XML data is 
typically stored in Unicode encoding format and IMS database data is typically stored in 
Extended Binary Coded Decimal Interchange Code (EBCDIC) encoding format. 

[085] Finally, the properly encoded content data of the appropriate data type 
from the XML element is stored 612 in the database field within the database. The 
database node for the database field is identified by the matching metadata element. 
Then, a determination 614 is made whether more XML elements have yet to be processed 
and stored. If so, the method 600 selects a next XML element. If not, the method 600 
ends 616. 

[086] Figure 7 illustrates a flow chart of a method 700 for retrieving an XML 
document 202 from a hierarchical database 204. The XML document 202 may have been 
previously stored in decomposed format or comprise data fields populated by another 
database transaction. In this manner, native data in the hierarchical database 204 may be 
extracted and packaged in an XML document 202 as necessary. 

[087] The method 700 begins 701 by receiving 702 a query. Preferably, the 
query is in the form of a standard SQL statement that includes reference to a User- 
Defined Function (UDF). The query may be provided by a user or a software module. 
An example query may be: "SELECT retrieveXML(Model) FROM DealerDB. Model 
WHERE Model.CarYear=1989". In the example, the "retrieveXML(Model)" 
expression initiates the method 700. The "Model" argument identifies a database root 
node in the hierarchical database 204. 
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[088] Next, the root node in the database 204 is located 704. In the example 
above, the root node is the database node named "Model". Typically, the sub-tree of the 
hierarchical database 204 beneath the root node is processed to generate the XML 
document 202. 

[089] Initially, the root node is selected 706. Subsequent processing of the sub- 
tree selects child nodes of the root node. Next, each database field of the database node is 
matched 708 to a metadata element in the metadata schema 208. In certain embodiments, 
a matching module 504 may match the database field name to a metadata element name 
to identify the matching metadata element. As discussed above, the matching module 
504 may traverse the hierarchical database 204 or sub-tree using external commands to 
the hierarchical database 204 which utilize built in tree-traversal functions of the 
hierarchical database 204. 

[090] Then, an XML element is generated 710 as defined in the matching 
metadata element. The XML element comprises content data from the matching database 
field. The content data may be converted to a suitable XML data type and/or encoding 
format if necessary. The data type information and encoding format information may be 
indicated by identifiers in the matching metadata element. 

[091] In one embodiment, the generated XML element is written out 712 to a 
file or other persistent storage location. Alternatively, the XML element may be written 
to temporary storage such as memory. Typically, the XML element is written out 712 
according to a hierarchical structure dictated by the second representation 312 (See Figure 
3) of the hierarchical structure of valid XML documents 202. End tags for the generated 
XML elements may be written out once the whole sub-tree has been processed. 
Alternatively, the end tags are written out with the remainder of the XML element, and 
nested XML elements are simply inserted at the appropriate location in the XML 
document 202. 
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[092] Finally, a determination 714 is made whether all of the database nodes of 
the sub-tree have been processed. If there are more database nodes, the method 700 
returns to step 706. If not, the method 700 ends 716. 

[093] Figure 8 illustrates a flow chart of a method 800 for storing XML 
documents 202 in an intact format within a hierarchical database 204. An intact format 
means that the metadata (formatting characters, strings, version identifiers, and the like) 
within the XML document 202 is preserved along with content data once the XML 
document 202 is stored in the hierarchical database 204. 

[094] The intact storage format may be used in a variety of circumstances. For 
example, when the XML document 202 does not contain data that needs to be loaded in 
to standard database fields such that standard non-XML applications can access the data. 
Intact storage may be useful when the content data of the XML document 202 is variable, 
unknown, or significantly larger than the size of database fields in the database 204. 

[095] For example, the content data may comprise pages and pages of content 
data representative of a human-readable document such as a user manual. Such content 
data may not need to be stored within a standard database field. Instead, it may be 
desirable that the whole XML document 202 be stored in the hierarchical database 204 to 
utilize the backup and recovery, security, and other features of the database 204. 

[096] Intact storage may be desired when retrieval speed of the whole XML 
document 202 is important. Alternatively, intact storage may be used where no document 
schema 302 associated with the XML document 202 exits. For example, no XML 
schema 302 may exist for the XML document 202. 

[097] In one embodiment, the method 800 begins 801 once an XML document 
202 is received preferably by way of a command identifying the location of the XML 
document 202. In addition, the command identifying the location of the XML document 
202 may provide a database node identifier. The database node identifier uniquely 
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identifies a database node within the hierarchical database 204 to receive the XML 
document 202. Preferably, the database node identifier identifies either a root node of 
new database or a new database node that has been added by extending an existing 
database 204. 

[098] In certain embodiments where database nodes are of set, predefined sizes, 
the new database node is of a particular type that restricts the new database node to a 
single child node. Similarly, the child node may be restricted to having only one child 
node. The new database node may include a flag whether the database node has a child, a 
grandchild, a great-grandchild, etc. Alternatively, where database nodes may be of 
variable size, a single database node may be created of a size sufficient to store the entire 
XML document 202 in intact format. 

[099] First, a first database node is initialized 802. The first database node is 
preferably the newly created database node identified by the database node identifier. 
Initializing the first database node may comprise determining the total length of the XML 
document 202, and determining how many generations of child database nodes will be 
required to store the XML document 202 intact. If the length of the XML document 202 
exceeds the size of the first database node, a flag in the first database node is set to 
indicate that one or more generations of child database nodes exist. These child database 
nodes may be referred to as overflow nodes. In addition to setting the flag indicating 
additional overflow nodes, initialization 802 may include storing version information, the 
length of the portion of the database node that will hold the raw data, and the like. In 
certain embodiments, initialization includes creating the appropriate number of child 
database nodes, overflow nodes, in the hierarchical database 204 to properly store the 
XML document 202 in intact format. 

[0100] Initialization 802 may also include identifying one or more break points 
within the XML document 202. The break points represent where the XML document 
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202 will be physically divided between the first database node and any subsequent child 
database nodes. Break points are determined based on the size of the raw data in the 
XML document 202 and the sizes of the first database node and any child database nodes, 
excluding any flag or header information. 

[0101] Next, the method 800 writes 804 a portion of raw data from the XML 
document 202 into the first database node. Error checking on the written portion may be 
performed. Raw data means the data has not been parsed, validated, or converted. The 
portion may comprise a length packet such as 8 bit, 16 bit, 32 bit, or the like. 
Alternatively, the portion may be the same size as the length of the portion in the database 
node that will receive the raw data. 

[0102] Portions of the XML document 202 are written beginning with the 
beginning of the XML document 202. Once a portion is written, a determination 806 is 
made whether the first database node is full. If not, a next sequential portion of the XML 
document 202 is written 804 to the first database node. 

[0103] If the first database node is full, a determination 808 is made whether the 

end of the XML document 202 has been reached. If so, the method 800 ends 810. If not, 

a second database node that is a child of the first database node is initialized 812. 

Typically, database nodes are filled once a break point is reached. Consequently, the 

method 800 continues by writing 804 a portion of raw data beginning at the break point 

oo from the XML document 202 into the second database node. The writing process 

w 

H 

< s continues until the whole XML document 202 is stored in one or more database nodes of 

2 5 | < the hierarchical database 204. 

^ I it: [0104] Referring now to Figures 6 and 8, in certain embodiments, an XML 

|S 3 document 202 may be stored in decomposed format with a sub-tree of the XML 

N °°< 

z M document 202 stored intact. Preferably, the XML document 202 includes an associated 

* document schema 302 such as an XML schema 302. The document schema 302 may 
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include different types of directive metadata elements. A metadata directive is an 
indicator that causes the XML element and/or a sub-tree, including the XML element as 
the root element, to be handled differently in being passed between the XML document 
202 and the hierarchical database 204. 

[0105] In one embodiment, a metadata directive element signals that the XML 
element within which the metadata directive element is encountered is to be stored in 
intact format. For example, when the method 600 selects 606 an XML element, a 
determination may be made whether the XML element includes any metadata directives. 
If a metadata directive is present for storing the selected XML element and its 
descendents intact, the method 600 may initiate the method 800. The method 800 may 
operate as described above except that the root XML element from which intact storage 
begins is the selected 606 XML element from method 600 rather than the root XML 
element for the whole XML document 202. 

[0106] Similarly, when a decomposed XML document 202 is retrieved from the 
hierarchical database 202 as described in relation to Figure 7, the same metadata directive 
may be used to determine that a particular database node is to be retrieved according to a 
method 900 for retrieving intact XML documents 202 or sub-trees. Method 900 is 
discussed in more detail in relation to Figure 9. In this manner, XML documents 202 
stored using a mixed format of intact and decomposed may also be retrieved as necessary. 

[0107] Intact storage of an XML document 202 results in binary data in the 
database nodes of the hierarchical database 204. The binary data is not available for use 
by other applications using the hierarchical database 204 until the XML document 202 is 
retrieved. However, it may be desirable for certain information within the XML 
document 202 to be made available such that XML-enabled applications using the 
hierarchical database 204 may identify and or locate the XML document 202 as 
necessary. 
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[0108] Consequently, in certain embodiments, particular XML elements of an 
XML document 202 being stored in intact format may be stored in indexable database 
nodes. In a hierarchical database 204 such as IMS, these indexable database nodes may 
comprise side segments. 

[0109] Referring now to Figures 6 and 8, as each XML element is selected 606, a 
determination may be made whether the XML element includes any metadata directives. 
In one embodiment, the analysis module 508 may examine each XML element to search 
for metadata directives. In addition, the analysis module 508 may communicate with 
other modules of the present invention to carry out the metadata directive depending on 
the type of metadata directive found. 

[0110] If a metadata directive is within the selected XML element and the 
metadata directive comprises an index indicator, all or a portion of the XML element may 
be stored in indexable database nodes such as side segments. The index indicator may 
include parameters that identify what parts of the XML element are to be stored in the 
indexable database nodes. The index values from the XML element (content data and/or 
attribute values) is then stored in an indexable database node. 

[0111] Then, a secondary index may be generated that references the root 
database node in the hierarchical database 204 and the indexable database nodes. The 
secondary index allows the indexable database nodes to be located using database 
queries. In this manner, a user or XML-enabled application using the hierarchical 
database 204 may locate an XML document 202 or portions thereof when the XML 
document 202 is stored in the hierarchical database 204 in intact format. 

[01 12] Figure 9 illustrates a flow chart of a method 900 for retrieving XML 
documents 202 stored in an intact format within a hierarchical database 204. The method 
900 uses a metadata schema 300 derived from the hierarchical database 204 or a view of 
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the hierarchical database 204. The metadata schema 300 includes a metadata element for 
each database node within the hierarchical database 204. 

[0113] The method 900 begins 902 by receiving 904 a key. Preferably, the key is 
unique. The key is used to locate 906 a first database node within the hierarchical 
database 204 where the intact XML document 202 has been stored. Next, raw data is 
sequentially written 908 from the data portion of the first database node to an XML 
document 202 such as an XML file. Once all the raw data is written from the first 
database node, a determination 910 is made whether the first database node has a 
descendent database node storing more raw data. As mentioned, this may be indicated by 
a flag in the first database node. 

[0114] If more raw data exists for the XML document 202, the method 900 
locates the child database node and sequentially writes 912 raw data from the descendent 
database node into the XML document 202. If the descendent database node includes a 
descendent database node, the process of writing the data is repeated until all the raw data 
in all the descendent database nodes has been written to the XML document 202. In 
certain embodiments, if a descendent database node includes one or more database node 
twins (descendent database nodes of the same type as the current descendent database 
node), the process of writing the data is repeated on the database node twins such that all 
the raw data in the database node twins is written to the XML document 202 before a 
next descendent database node is selected. If no more raw data exists, in descendent 
database nodes or database node twins, for the XML document 202, the method 900 ends 
914. 

[01 15] In summary, the present invention provides an apparatus, system, and 
method for passing data between sharing an XML document and a hierarchical database. 
The present invention allows for storage and retrieval of XML data and/or the XML 
document in a decomposed, intact, or mixed formats within a hierarchical database 
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without modifying the database or database server. The present invention allows for 
indexing of an XML document or a sub-tree of the XML document when the XML 
document or sub-tree is stored in the hierarchical database in an intact format. 

[0116] The present invention may be embodied in other specific forms without 
departing from its spirit or essential characteristics. The described embodiments are to be 
considered in all respects only as illustrative and not restrictive. The scope of the 
invention is, therefore, indicated by the appended claims rather than by the foregoing 
description. All changes which come within the meaning and range of equivalency of the 
claims are to be embraced within their scope. 

[01 17] What is claimed is: 
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